System and method for communication using interactive avatar

ABSTRACT

A video communication system that replaces actual live images of the participating users with animated avatars. A method may include selecting an avatar, initiating communication, capturing an image, detecting a face in the image, determining facial characteristics from the face, including eye movement and eyelid movement of a user indicative of direction of user gaze and blinking, respectively, converting the facial features to avatar parameters, and transmitting at least one of the avatar selection or avatar parameters.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a Continuation of U.S. application Ser. No.13/996,230, filed on Jun. 20, 2013, which is a National PhaseApplication of PCT Application No. PCT/CN2012/000461, filed on Apr. 9,2012, which in turn claims priority from PCT/CN2011/084902, filed onDec. 29, 2011, the entire disclosures of which are incorporated hereinby reference.

FIELD

The present disclosure relates to video communication and interaction,and, more particularly, to a system and method for communication usinginteractive avatars.

BACKGROUND

The increasing variety of functionality available in mobile devices hasspawned a desire for users to communicate via video in addition tosimple calls. For example, users may initiate “video calls,”“videoconferencing,” etc., wherein a camera and microphone in a devicetransmits audio and real-time video of a user to one or more otherrecipients such as other mobile devices, desktop computers,videoconferencing systems, etc. The communication of real-time video mayinvolve the transmission of substantial amounts of data (e.g., dependingon the technology of the camera, the particular video codec employed toprocess the real time image information, etc.). Given the bandwidthlimitations of existing 2G/3G wireless technology, and the still limitedavailability of emerging 4G wireless technology, the proposition of manydevice users conducting concurrent video calls places a large burden onbandwidth in the existing wireless communication infrastructure, whichmay impact negatively on the quality of the video call.

BRIEF DESCRIPTION OF DRAWINGS

Features and advantages of various embodiments of the claimed subjectmatter will become apparent as the following Detailed Descriptionproceeds, and upon reference to the Drawings, wherein like numeralsdesignate like parts, and in which:

FIG. 1A illustrates an example device-to-device system consistent withvarious embodiments of the present disclosure;

FIG. 1B illustrates an example virtual space system consistent withvarious embodiments of the present disclosure;

FIG. 2 illustrates an example device in consistent with variousembodiments of the present disclosure;

FIG. 3 illustrates an example face detection module consistent withvarious embodiments of the present disclosure;

FIG. 4 illustrates an example system implementation in accordance withat least one embodiment of the present disclosure; and

FIG. 5 is a flowchart of example operations in accordance with at leastone embodiment of the present disclosure.

Although the following Detailed Description will proceed with referencebeing made to illustrative embodiments, many alternatives,modifications, and variations thereof will be apparent to those skilledin the art.

DETAILED DESCRIPTION

By way of overview, the present disclosure is generally directed to asystem and method for video communication and interaction usinginteractive avatars. A system and method consistent with the presentdisclosure generally provides detection and/or tracking of a user's eyesduring active communication, including the detection of characteristicsof a user's eyes, including, but not limited to, eyeball movement, gazedirection and/or point of focus of the user's eyes, eye blinking, etc.The system and method is further configured to provide avatar animationbased at least in part on the detected characteristics of the user'seyes in real-time or near real-time during active communication.

In one embodiment an application is activated in a device coupled to acamera. The application may be configured to allow a user to select anavatar for display on a remote device, in a virtual space, etc. Thedevice may then be configured to initiate communication with at leastone other device, a virtual space, etc. For example, the communicationmay be established over a 2G, 3G, 4G cellular connection. Alternatively,the communication may be established over the Internet via a WiFiconnection. After the communication is established, the camera may beconfigured to start capturing images. Facial detection is then performedon the captured images, and facial characteristics are determined. Thedetected face/head movements, including movement of the user's eyesand/or eyelids, and/or changes in facial features are then convertedinto parameters usable for animating the avatar on the at least oneother device, within the virtual space, etc. At least one of the avatarselection or avatar parameters are then transmitted. In one embodimentat least one of a remote avatar selection or remote avatar parametersare received. The remote avatar selection may cause the device todisplay an avatar, while the remote avatar parameters may cause thedevice to animate the displayed avatar. Audio communication accompaniesthe avatar animation via known methods.

FIG. 1A illustrates device-to-device system 100 consistent with variousembodiments of the present disclosure. The system 100 may generallyinclude devices 102 and 112 communicating via network 122. Device 102includes at least camera 104, microphone 106 and display 108. Device 112includes at least camera 114, microphone 116 and display 118. Network122 includes at least one server 124.

Devices 102 and 112 may include various hardware platforms that arecapable of wired and/or wireless communication. For example, devices 102and 112 may include, but are not limited to, videoconferencing systems,desktop computers, laptop computers, tablet computers, smart phones,(e.g., iPhones®, Android®-based phones, Blackberries®, Symbian®-basedphones, Palm®-based phones, etc.), cellular handsets, etc.

Cameras 104 and 114 include any device for capturing digital imagesrepresentative of an environment that includes one or more persons, andmay have adequate resolution for face analysis of the one or morepersons in the environment as described herein. For example, cameras 104and 114 may include still cameras (e.g., cameras configured to capturestill photographs) or video cameras (e.g., cameras configured to capturemoving images comprised of a plurality of frames). Cameras 104 and 114may be configured to operate using light in the visible spectrum or withother portions of the electromagnetic spectrum not limited to theinfrared spectrum, ultraviolet spectrum, etc. Cameras 104 and 114 may beincorporated within devices 102 and 112, respectively, or may beseparate devices configured to communicate with devices 102 and 112 viawired or wireless communication. Specific examples of cameras 104 and114 may include wired (e.g., Universal Serial Bus (USB), Ethernet,Firewire, etc.) or wireless (e.g., WiFi, Bluetooth, etc.) web cameras asmay be associated with computers, video monitors, etc., mobile devicecameras (e.g., cell phone or smart phone cameras integrated in, forexample, the previously discussed example devices), integrated laptopcomputer cameras, integrated tablet computer cameras (e.g., iPad®,Galaxy Tab®, and the like), etc.

Devices 102 and 112 may further include microphones 106 and 116.Microphones 106 and 116 include any devices configured to sense sound.Microphones 106 and 116 may be integrated within devices 102 and 112,respectively, or may interact with the devices 102, 112 via wired orwireless communication such as described in the above examples regardingcameras 104 and 114. Displays 108 and 118 include any devices configuredto display text, still images, moving images (e.g., video), userinterfaces, graphics, etc. Displays 108 and 118 may be integrated withindevices 102 and 112, respectively, or may interact with the devices viawired or wireless communication such as described in the above examplesregarding cameras 104 and 114.

In one embodiment, displays 108 and 118 are configured to displayavatars 110 and 120, respectively. As referenced herein, an Avatar isdefined as graphical representation of a user in either two-dimensions(2D) or three-dimensions (3D). Avatars do not have to resemble the looksof the user, and thus, while avatars can be lifelike representationsthey can also take the form of drawings, cartoons, sketches, etc. Asshown, device 102 may display avatar 110 representing the user of device112 (e.g., a remote user), and likewise, device 112 may display avatar120 representing the user of device 102. As such, users may view arepresentation of other users without having to exchange large amountsof information that are generally involved with device-to-devicecommunication employing live images.

Network 122 may include various second generation (2G), third generation(3G), fourth generation (4G) cellular-based data communicationtechnologies, Wi-Fi wireless data communication technology, etc. Network122 includes at least one server 124 configured to establish andmaintain communication connections when using these technologies. Forexample, server 124 may be configured to support Internet-relatedcommunication protocols like Session Initiation Protocol (SIP) forcreating, modifying and terminating two-party (unicast) and multi-party(multicast) sessions, Interactive Connectivity Establishment Protocol(ICE) for presenting a framework that allows protocols to be built ontop of bytestream connections, Session Traversal Utilities for NetworkAccess Translators, or NAT, Protocol (STUN) for allowing applicationsoperating through a NAT to discover the presence of other NATs, IPaddresses and ports allocated for an application's User DatagramProtocol (UDP) connection to connect to remote hosts, Traversal UsingRelays around NAT (TURN) for allowing elements behind a NAT or firewallto receive data over Transmission Control Protocol (TCP) or UDPconnections, etc.

FIG. 1B illustrates a virtual space system 126 consistent with variousembodiments of the present disclosure. The system 126 may include device102, device 112 and server 124. Device 102, device 112 and server 124may continue to communicate in the manner similar to that illustrated inFIG. 1A, but user interaction may take place in virtual space 128instead of in a device-to-device format. As referenced herein, a virtualspace may be defined as a digital simulation of a physical location. Forexample, virtual space 128 may resemble an outdoor location like a city,road, sidewalk, field, forest, island, etc., or an inside location likean office, house, school, mall, store, etc.

Users, represented by avatars, may appear to interact in virtual space128 as in the real world. Virtual space 128 may exist on one or moreservers coupled to the Internet, and may be maintained by a third party.Examples of virtual spaces include virtual offices, virtual meetingrooms, virtual worlds like Second Life®, massively multiplayer onlinerole-playing games (MMORPGs) like World of Warcraft®, massivelymultiplayer online real-life games (MMORLGs), like The Sims Online®,etc. In system 126, virtual space 128 may contain a plurality of avatarscorresponding to different users. Instead of displaying avatars,displays 108 and 118 may display encapsulated (e.g., smaller) versionsof virtual space (VS) 128. For example, display 108 may display aperspective view of what the avatar corresponding to the user of device102 “sees” in virtual space 128. Similarly, display 118 may display aperspective view of what the avatar corresponding to the user of device112 “sees” in virtual space 128. Examples of what avatars might see invirtual space 128 may include, but are not limited to, virtualstructures (e.g., buildings), virtual vehicles, virtual objects, virtualanimals, other avatars, etc.

FIG. 2 illustrates an example device 102 in accordance with variousembodiments of the present disclosure. While only device 102 isdescribed, device 112 (e.g., remote device) may include resourcesconfigured to provide the same or similar functions. As previouslydiscussed, device 102 is shown including camera 104, microphone 106 anddisplay 108. The camera 104 and microphone 106 may provide input to acamera and audio framework module 200. The camera and audio frameworkmodule 200 may include custom, proprietary, known and/or after-developedaudio and video processing code (or instruction sets) that are generallywell-defined and operable to control at least camera 104 and microphone106. For example, the camera and audio framework module 200 may causecamera 104 and microphone 106 to record images and/or sounds, mayprocess images and/or sounds, may cause images and/or sounds to bereproduced, etc. The camera and audio framework module 200 may varydepending on device 102, and more particularly, the operating system(OS) running in device 102. Example operating systems include iOS®,Android®, Blackberry® OS, Symbian®, Palm® OS, etc. A speaker 202 mayreceive audio information from camera and audio framework module 200 andmay be configured to reproduce local sounds (e.g., to provide audiofeedback of the user's voice) and remote sounds (e.g., the sound of theother parties engaged in a telephone, video call or interaction in avirtual place).

The device 102 may further include a face detection module 204configured to identify and track a head, face and/or facial regionwithin image(s) provided by camera 104 and to determine one or morefacial characteristics of the user (i.e., facial characteristics 206).For example, the face detection module 204 may include custom,proprietary, known and/or after-developed face detection code (orinstruction sets), hardware, and/or firmware that are generallywell-defined and operable to receive a standard format image (e.g., butnot limited to, a RGB color image) and identify, at least to a certainextent, a face in the image.

The face detection module 204 may also be configured to track thedetected face through a series of images (e.g., video frames at 24frames per second) and to determine a head position based on thedetected face. Known tracking systems that may be employed by facedetection module 204 may include particle filtering, mean shift, Kalmanfiltering, etc., each of which may utilize edge analysis,sum-of-square-difference analysis, feature point analysis, histogramanalysis, skin tone analysis, etc.

The face detection module 204 may also include custom, proprietary,known and/or after-developed facial characteristics code (or instructionsets) that are generally well-defined and operable to receive a standardformat image (e.g., but not limited to, a RGB color image) and identify,at least to a certain extent, one or more facial characteristics in theimage. Such known facial characteristics systems include, but are notlimited to, the CSU Face Identification Evaluation System by ColoradoState University, standard Viola-Jones boosting cascade framework, whichmay be found in the public Open Source Computer Vision (OpenCV™)package.

As discussed in greater detail herein, facial characteristics 206 mayinclude features of the face, including, but not limited to, thelocation and/or shape of facial landmarks such as eyes, eyebrows, nose,mouth, etc., as well as movement of the eyes and/or eyelids. In oneembodiment, avatar animation may be based on sensed facial actions(e.g., changes in facial characteristics 206). The corresponding featurepoints on an avatar's face may follow or mimic the movements of the realperson's face, which is known as “expression clone” or“performance-driven facial animation.”

The face detection module 204 may also be configured to recognize anexpression associated with the detected features (e.g., identifyingwhether a previously detected face is happy, sad, smiling, frown,surprised, excited, etc.)). Thus, the face detection module 204 mayfurther include custom, proprietary, known and/or after-developed facialexpression detection and/or identification code (or instruction sets)that is generally well-defined and operable to detect and/or identifyexpressions in a face. For example, the face detection module 204 maydetermine size and/or position of facial features (e.g., eyes, mouth,cheeks, teeth, etc.) and may compare these facial features to a facialfeature database which includes a plurality of sample facial featureswith corresponding facial feature classifications (e.g. smiling, frown,excited, sad, etc.).

The device 102 may further include an avatar selection module 208configured to allow a user of device 102 to select an avatar for displayon a remote device. The avatar selection module 208 may include custom,proprietary, known and/or after-developed user interface constructioncode (or instruction sets) that are generally well-defined and operableto present different avatars to a user so that the user may select oneof the avatars.

In one embodiment one or more avatars may be predefined in device 102.Predefined avatars allow all devices to have the same avatars, andduring interaction only the selection of an avatar (e.g., theidentification of a predefined avatar) needs to be communicated to aremote device or virtual space, which reduces the amount of informationthat needs to be exchanged. Avatars are selected prior to establishingcommunication, but may also be changed during the course of an activecommunication. Thus, it may be possible to send or receive an avatarselection at any point during the communication, and for the receivingdevice to change the displayed avatar in accordance with the receivedavatar selection,

The device 102 may further include an avatar control module 210configured to generate parameters for animating an avatar. Animation, asreferred to herein, may be defined as altering the appearance of animage/model. A single animation may alter the appearance of a 2-D stillimage, or multiple animations may occur in sequence to simulate motionin the image (e.g., head turn, nodding, talking, frowning, smiling,laughing, blinking, winking, etc.). An example of animation for 3-Dmodels includes deforming a 3-D wireframe model, applying a texturemapping, and re-computing the model vertex normal for rendering. Achange in position of the detected face and/or facial characteristic206, including facial features, may be may converted into parametersthat cause the avatar's features to resemble the features of the user'sface.

In one embodiment the general expression of the detected face may beconverted into one or more parameters that cause the avatar to exhibitthe same expression. The expression of the avatar may also beexaggerated to emphasize the expression. Knowledge of the selectedavatar may not be necessary when avatar parameters may be appliedgenerally to all of the predefined avatars. However, in one embodimentavatar parameters may be specific to the selected avatar, and thus, maybe altered if another avatar is selected. For example, human avatars mayrequire different parameter settings (e.g., different avatar featuresmay be altered) to demonstrate emotions like happy, sad, angry,surprised, etc. than animal avatars, cartoon avatars, etc.

The avatar control module 210 may include custom, proprietary, knownand/or after-developed graphics processing code (or instruction sets)that are generally well-defined and operable to generate parameters foranimating the avatar selected by avatar selection module 208 based onthe face/head position and/or facial characteristics 206 detected byface detection module 204. For facial feature-based animation methods,2-D avatar animation may be done with, for example, image warping orimage morphing, whereas 3-D avatar animation may be done with free formdeformation (FFD) or by utilizing the animation structure defined in a3-D model of a head. Oddcast is an example of a software resource usablefor 2-D avatar animation, while FaceGen is an example of a softwareresource usable for 3-D avatar animation.

In addition, in system 100, the avatar control module 210 may receive aremote avatar selection and remote avatar parameters usable fordisplaying and animating an avatar corresponding to a user at a remotedevice. The avatar control module 210 may cause a display module 212 todisplay an avatar 110 on the display 108. The display module 212 mayinclude custom, proprietary, known and/or after-developed graphicsprocessing code (or instruction sets) that are generally well-definedand operable to display and animate an avatar on display 108 inaccordance with the example device-to-device embodiment.

For example, the avatar control module 210 may receive a remote avatarselection and may interpret the remote avatar selection to correspond toa predetermined avatar. The display module 212 may then display avatar110 on display 108. Moreover, remote avatar parameters received inavatar control module 210 may be interpreted, and commands may beprovided to display module 212 to animate avatar 110.

In one embodiment more than two users may engage in the video call. Whenmore than two users are interacting in a video call, the display 108 maybe divided or segmented to allow more than one avatar corresponding toremote users to be displayed simultaneously. Alternatively, in system126, the avatar control module 210 may receive information causing thedisplay module 212 to display what the avatar corresponding to the userof device 102 is “seeing” in virtual space 128 (e.g., from the visualperspective of the avatar). For example, the display 108 may displaybuildings, objects, animals represented in virtual space 128, otheravatars, etc. In one embodiment, the avatar control module 210 may beconfigured to cause the display module 212 to display a “feedback”avatar 214. The feedback avatar 214 represents how the selected avatarappears on the remote device, in a virtual place, etc. In particular,the feedback avatar 214 appears as the avatar selected by the user andmay be animated using the same parameters generated by avatar controlmodule 210. In this way the user may confirm what the remote user isseeing during their interaction.

The device 102 may further include a communication module 216 configuredto transmit and receive information for selecting avatars, displayingavatars, animating avatars, displaying virtual place perspective, etc.The communication module 216 may include custom, proprietary, knownand/or after-developed communication processing code (or instructionsets) that are generally well-defined and operable to transmit avatarselections, avatar parameters and receive remote avatar selections andremote avatar parameters. The communication module 216 may also transmitand receive audio information corresponding to avatar-basedinteractions. The communication module 216 may transmits and receive theabove information via network 122 as previously described.

The device 102 may further include one or more processor(s) 218configured to perform operations associated with device 102 and one ormore of the modules included therein.

FIG. 3 illustrates an example face detection module 204 a consistentwith various embodiments of the present disclosure. The face detectionmodule 204 a may be configured to receive one or more images from thecamera 104 via the camera and audio framework module 200 and identify,at least to a certain extent, a face (or optionally multiple faces) inthe image. The face detection module 204 a may also be configured toidentify and determine, at least to a certain extent, one or more facialcharacteristics 206 in the image. The facial characteristics 206 may begenerated based on one or more of the facial parameters identified bythe face detection module 204 a as described herein. The facialcharacteristics 206 may include may include features of the face,including, but not limited to, the location and/or shape of faciallandmarks such as eyes, eyebrows, nose, mouth, etc., as well as movementof the mouth, eyes and/or eyelids.

In the illustrated embodiment, the face detection module 204 a mayinclude a face detection/tracking module 300, a face normalizationmodule 302, a landmark detection module 304, a facial pattern module306, a face posture module 308, a facial expression detection module310, an eye detection/tracking module 312 and an eye classificationmodule 314. The face detection/tracking module 300 may include custom,proprietary, known and/or after-developed face tracking code (orinstruction sets) that is generally well-defined and operable to detectand identify, at least to a certain extent, the size and location ofhuman faces in a still image or video stream received from the camera104. Such known face detection/tracking systems include, for example,the techniques of Viola and Jones, published as Paul Viola and MichaelJones, Rapid Object Detection using a Boosted Cascade of SimpleFeatures, Accepted Conference on Computer Vision and PatternRecognition, 2001. These techniques use a cascade of Adaptive Boosting(AdaBoost) classifiers to detect a face by scanning a windowexhaustively over an image. The face detection/tracking module 300 mayalso track a face or facial region across multiple images.

The face normalization module 302 may include custom, proprietary, knownand/or after-developed face normalization code (or instruction sets)that is generally well-defined and operable to normalize the identifiedface in the image. For example, the face normalization module 302 may beconfigured to rotate the image to align the eyes (if the coordinates ofthe eyes are known), crop the image to a smaller size generallycorresponding the size of the face, scale the image to make the distancebetween the eyes constant, apply a mask that zeros out pixels not in anoval that contains a typical face, histogram equalize the image tosmooth the distribution of gray values for the non-masked pixels, and/ornormalize the image so the non-masked pixels have mean zero and standarddeviation one.

The landmark detection module 304 may include custom, proprietary, knownand/or after-developed landmark detection code (or instruction sets)that is generally well-defined and operable to detect and identify, atleast to a certain extent, the various facial features of the face inthe image. Implicit in landmark detection is that the face has alreadybeen detected, at least to some extent. Optionally, some degree oflocalization may have been performed (for example, by the facenormalization module 302) to identify/focus on the zones/areas of theimage where landmarks can potentially be found. For example, thelandmark detection module 304 may be based on heuristic analysis and maybe configured to identify and/or analyze the relative position, size,and/or shape of the eyes (and/or the corner of the eyes), nose (e.g.,the tip of the nose), chin (e.g. tip of the chin), cheekbones, and jaw.The eye-corners and mouth corners may also be detected using Viola-Jonesbased classifier.

The facial pattern module 306 may include custom, proprietary, knownand/or after-developed facial pattern code (or instruction sets) that isgenerally well-defined and operable to identify and/or generate a facialpattern based on the identified facial landmarks in the image. As may beappreciated, the facial pattern module 306 may be considered a portionof the face detection/tracking module 300.

The face posture module 308 may include custom, proprietary, knownand/or after-developed facial orientation detection code (or instructionsets) that is generally well-defined and operable to detect andidentify, at least to a certain extent, the posture of the face in theimage. For example, the face posture module 308 may be configured toestablish the posture of the face in the image with respect to thedisplay 108 of the device 102. More specifically, the face posturemodule 308 may be configured to determine whether the user's face isdirected toward the display 108 of the device 102, thereby indicatingwhether the user is observing the content being displayed on the display108.

The facial expression detection module 310 may include custom,proprietary, known and/or after-developed facial expression detectionand/or identification code (or instruction sets) that is generallywell-defined and operable to detect and/or identify facial expressionsof the user in the image. For example, the facial expression detectionmodule 310 may determine size and/or position of the facial features(e.g., eyes, mouth, cheeks, teeth, etc.) and compare the facial featuresto a facial feature database which includes a plurality of sample facialfeatures with corresponding facial feature classifications.

The eye detection/tracking module 312 may include custom, proprietary,known and/or after-developed eye tracking code (or instruction sets)that is generally well-defined and operable to detect and identify, atleast to a certain extent, eye movement and/or eye gaze or focus of theuser in the image. Similar to the face posture module 308, the eyedetection/tracking module 312 may be configured to establish thedirection in which the user's eyes are directed with respect to thedisplay 108 of the device 102. The eye detection/tracking module 312 maybe further configured to establish eye blinking of a user.

As shown, the eye detection/tracking module 312 may include an eyeclassification module 314 configured to determine whether the user'seyes (individually and/or both) are open or closed and movement of theuser's eyes with respect to the display 108. In particular, the eyeclassification module 314 is configured to receive one or morenormalized images (images normalized by the normalization module 302). Anormalized image may include, but is not limited to, rotation to alignthe eyes (if the coordinates of the eyes are known), cropping of theimage, particularly cropping of the eyes with reference to theeye-corner position, scaling the image to make the distance between theeyes constant, histogram equalizing the image to smooth the distributionof gray values for the non-masked pixels, and/or normalizing the imageso the non-masked pixels have mean zero and a unit standard deviation.

Upon receipt of one or more normalized images, the eye classificationmodule 314 may be configured to separately identify eye opening/closingand/or eye movement (e.g. looking left/right, up/down, diagonally, etc.)with respect to the display 108 and, as such, determine a status of theuser's eyes in real-time or near real-time during active videocommunication and/or interaction. The eye classification module 314 mayinclude custom, proprietary, known and/or after-developed eye trackingcode (or instruction sets) that is generally well-defined and operableto detect and identify, at least to a certain extent, movement of theeyelids and eyes of the user in the image. In one embodiment, the eyeclassification module 314 may use statistical-based analysis in order toidentify the status of the user's eyes (open/close, movement, etc.),including, but not limited to, linear discriminant analysis (LDA),artificial neural network (ANN) and/or support vector machine (SVM).During analysis, the eye classification module 314 may further utilizean eye status database, which may include a plurality of sample eyefeatures with corresponding eye feature classifications.

As previously described, avatar animation may be based on sensed facialactions (e.g., changes in facial characteristics 206 of a user,including eye and/or eyelid movement. The corresponding feature pointson an avatar's face may follow or mimic the movements of the realperson's face, which is known as “expression clone” or“performance-driven facial animation.” Accordingly, eye opening/closingand eye movement may be animated in the avatar model during active videocommunication and/or interaction by any known methods.

For example, upon receipt of the avatar selection and avatar parametersfrom the device 102, an avatar control module of the remote device 112may be configured to control (e.g. animate) the avatar based on thefacial characteristics 206, including the eye and/or eyelid movement ofthe user. This may include normalizing and remapping the user's face tothe avatar face, copying any changes to the facial characteristics 206and driving the avatar to perform the same facial characteristics and/orexpression changes. For facial feature-based animation methods, 2-Davatar animation may be done with, for example, image warping or imagemorphing, whereas 3-D avatar animation may be done with free formdeformation (FFD) or by utilizing the animation structure defined in a3-D model of a head. Oddcast is an example of a software resource usablefor 2-D avatar animation, while FaceGen is an example of a softwareresource usable for 3-D avatar generation and animation.

FIG. 4 illustrates an example system implementation in accordance withat least one embodiment. Device 102′ is configured to communicatewirelessly via WiFi connection 400 (e.g., at work), server 124′ isconfigured to negotiate a connection between devices 102′ and 112′ viaInternet 402, and apparatus 112′ is configured to communicate wirelesslyvia another WiFi connection 404 (e.g., at home). In one embodiment, adevice-to-device avatar-based video call application is activated inapparatus 102′. Following avatar selection, the application may allow atleast one remote device (e.g., device 112′) to be selected. Theapplication may then cause device 102′ to initiate communication withdevice 112′. Communication may be initiated with device 102′transmitting a connection establishment request to device 112′ viaenterprise access point (AP) 406. The enterprise AP 406 may be an APusable in a business setting, and thus, may support higher datathroughput and more concurrent wireless clients than home AP 414. Theenterprise AP 406 may receive the wireless signal from device 102′ andmay proceed to transmit the connection establishment request throughvarious business networks via gateway 408, The connection establishmentrequest may then pass through firewall 410, which may be configured tocontrol information flowing into and out of the WiFi network 400.

The connection establishment request of device 102′ may then beprocessed by server 124′. The server 124′ may be configured forregistration of IP addresses, authentication of destination addressesand NAT traversals so that the connection establishment request may bedirected to the correct destination on Internet 402. For example, server124′ may resolve the intended destination (e.g., remote device 112′)from information in the connection establishment request received fromdevice 102′, and may route the signal to through the correct NATs, portsand to the destination IP address accordingly. These operations may onlyhave to be performed during connection establishment, depending on thenetwork configuration.

In some instances operations may be repeated during the video call inorder to provide notification to the NAT to keep the connection alive.Media and Signal Path 412 may carry the video (e.g., avatar selectionand/or avatar parameters) and audio information direction to home AP 414after the connection has been established. Device 112′ may then receivethe connection establishment request and may be configured to determinewhether to accept the request. Determining whether to accept the requestmay include, for example, presenting a visual narrative to a user ofdevice 112′ inquiring as to whether to accept the connection requestfrom device 102′. Should the user of device 112′ accept the connection(e.g., accept the video call) the connection may be established. Cameras104′ and 114′ may be configured to then start capturing images of therespective users of devices 102′ and 112′, respectively, for use inanimating the avatars selected by each user. Microphones 106′ and 116′may be configured to then start recording audio from each user. Asinformation exchange commences between devices 102′ and 112′, displays108′ and 118′ may display and animate avatars corresponding to the usersof devices 102′ and 112′.

FIG. 5 is a flowchart of example operations in accordance with at leastone embodiment. In operation 502 an application (e.g., an avatar-basedvoice call application) may be activated in a device. Activation of theapplication may be followed by selection of an avatar. Selection of anavatar may include an interface being presented by the application, theinterface allowing the user to select a predefined avatar. After avatarselection, communications may be configured in operation 504.Communication configuration includes the identification of at least oneremote device or a virtual space for participation in the video call.For example, a user may select from a list of remote users/devicesstored within the application, stored in association with another systemin the device (e.g., a contacts list in a smart phone, cell phone,etc.), stored remotely, such as on the Internet (e.g., in a social mediawebsite like Facebook, LinkedIn, Yahoo, Google+, MSN, etc.).Alternatively, the user may select to go online in a virtual space likeSecond Life.

In operation 506, communication may be initiated between the device andthe at least one remote device or virtual space. For example, aconnection establishment request may be transmitted to the remote deviceor virtual space. For the sake of explanation herein, it is assumed thatthe connection establishment request is accepted by the remote device orvirtual space. A camera in the device may then begin capturing images inoperation 508, The images may be still images or live video (e.g.,multiple images captured in sequence). In operation 510 image analysismay occur starting with detection/tracking of a face/head in the image.The detected face may then be analyzed in order to detect facialcharacteristics (e.g., facial landmarks, facial expression, etc.). Inoperation 512 the detected face/head position and/or facialcharacteristics are converted into Avatar parameters. Avatar parametersare used to animate the selected avatar on the remote device or in thevirtual space. In operation 514 at least one of the avatar selection orthe avatar parameters may be transmitted.

Avatars may be displayed and animated in operation 516. In the instanceof device-to-device communication (e.g., system 100), at least one ofremote avatar selection or remote avatar parameters may be received fromthe remote device. An avatar corresponding to the remote user may thenbe displayed based on the received remote avatar selection, and may beanimated based on the received remote avatar parameters. In the instanceof virtual place interaction (e.g., system 126), information may bereceived allowing the device to display what the avatar corresponding tothe device user is seeing. A determination may then be made in operation518 as to whether the current communication is complete. If it isdetermined in operation 518 that the communication is not complete,operations 508-516 may repeat in order to continue to display andanimate an avatar on the remote apparatus based on the analysis of theuser's face. Otherwise, in operation 520 the communication may beterminated. The video call application may also be terminated if, forexample, no further video calls are to be made.

While FIG. 5 illustrates various operations according to an embodiment,it is to be understood that not all of the operations depicted in FIG. 5are necessary for other embodiments. Indeed, it is fully contemplatedherein that in other embodiments of the present disclosure, theoperations depicted in FIG. 5 and/or other operations described hereinmay be combined in a manner not specifically shown in any of thedrawings, but still fully consistent with the present disclosure. Thus,claims directed to features and/or operations that are not exactly shownin one drawing are deemed within the scope and content of the presentdisclosure.

A system consistent with the present disclosure provides detectionand/or tracking of a user's eyes during active communication, includingthe detection of characteristics of a user's eyes, including, but notlimited to, eyeball movement, gaze direction and/or point of focus ofthe user's eyes, eye blinking, etc. The system uses a statistical-basedapproach for the determination of the status (e.g. open/closed eyeand/or direction of eye gaze) of a user's eyes. The system furtherprovides avatar animation based at least in part on the detectedcharacteristics of the user's eyes in real-time or near real-time duringactive communication and interaction. Animation of a user's eyes mayenhance interaction between users, as the human eyes and thecharacteristics associated with them, including movement and expression,may convey rich information during active communication, such as, forexample, a user's interest, emotions, etc.

A system consistent with the present disclosure provides advantages. Forexample, the use of statistical-based methods allows the performance ofeye analysis and classifying to be improved by increasing samplecollection and classifier re-training. Additionally, in contrast toother known methods of eye analysis, such as, for example,template-matching methods and/or geometry-based methods, a systemconsistent with the present disclosure generally does not requirecalibration before use nor does the system require special hardware,such as, for example, infrared lighting or close-view camera.Additionally, a system consistent with the present disclosure does notrequire a learning process for new user's.

Various features, aspects, and embodiments have been described herein.The features, aspects, and embodiments are susceptible to combinationwith one another as well as to variation and modification, as will beunderstood by those having skill in the art. The present disclosureshould, therefore, be considered to encompass such combinations,variations, and modifications. Thus, the breadth and scope of thepresent invention should not be limited by any of the above-describedexemplary embodiments, but should be defined only in accordance with thefollowing claims and their equivalents.

As used in any embodiment herein, the term “module” may refer tosoftware, firmware and/or circuitry configured to perform any of theaforementioned operations. Software may be embodied as a softwarepackage, code, instructions, instruction sets and/or data recorded onnon-transitory computer readable storage medium. Firmware may beembodied as code, instructions or instruction sets and/or data that arehard-coded (e.g., nonvolatile) in memory devices. “Circuitry”, as usedin any embodiment herein, may comprise, for example, singly or in anycombination, hardwired circuitry, programmable circuitry such ascomputer processors comprising one or more individual instructionprocessing cores, state machine circuitry, and/or firmware that storesinstructions executed by programmable circuitry. The modules may,collectively or individually, be embodied as circuitry that forms partof a larger system, for example, an integrated circuit (IC), systemon-chip (SoC), desktop computers, laptop computers, tablet computers,servers, smart phones, etc.

Any of the operations described herein may be implemented in a systemthat includes one or more storage mediums having stored thereon,individually or in combination, instructions that when executed by oneor more processors perform the methods. Here, the processor may include,for example, a server CPU, a mobile device CPU, and/or otherprogrammable circuitry. Also, it is intended that operations describedherein may be distributed across a plurality of physical devices, suchas processing structures at more than one different physical location.The storage medium may include any type of tangible medium, for example,any type of disk including hard disks, floppy disks, optical disks,compact disk read-only memories (CD-ROMs), compact disk rewritables(CD-RWs), and magneto-optical disks, semiconductor devices such asread-only memories (ROMs), random access memories (RAMs) such as dynamicand static RAMs, erasable programmable read-only memories (EPROMs),electrically erasable programmable read-only memories (EEPROMs), flashmemories, Solid State Disks (SSDs), magnetic or optical cards, or anytype of media suitable for storing electronic instructions. Otherembodiments may be implemented as software modules executed by aprogrammable control device. The storage medium may be non-transitory.

The terms and expressions which have been employed herein are used asterms of description and not of limitation, and there is no intention,in the use of such terms and expressions, of excluding any equivalentsof the features shown and described (or portions thereof), and it isrecognized that various modifications are possible within the scope ofthe claims. Accordingly, the claims are intended to cover all suchequivalents. Various features, aspects, and embodiments have beendescribed herein. The features, aspects, and embodiments are susceptibleto combination with one another as well as to variation andmodification, as will be understood by those having skill in the art.The present disclosure should, therefore, be considered to encompasssuch combinations, variations, and modifications.

As described herein, various embodiments may be implemented usinghardware elements, software elements, or any combination thereof.Examples of hardware elements may include processors, microprocessors,circuits, circuit elements (e.g., transistors, resistors, capacitors,inductors, and so forth), integrated circuits, application specificintegrated circuits (ASIC), programmable logic devices (PLD), digitalsignal processors (DSP), field programmable gate array (FPGA), logicgates, registers, semiconductor device, chips, microchips, chip sets,and so forth.

Reference throughout this specification to “one embodiment” or “anembodiment” means that a particular feature, structure, orcharacteristic described in connection with the embodiment is includedin at least one embodiment. Thus, appearances of the phrases “in oneembodiment” or “in an embodiment” in various places throughout thisspecification are not necessarily all referring to the same embodiment.Furthermore, the particular features, structures, or characteristics maybe combined in any suitable manner in one or more embodiments.

According to one aspect, there is provided a system for interactiveavatar communication interactive avatar communication between a firstuser device and a remote user device. The system includes a cameraconfigured to capture images, a communication module configured toinitiate and establish communication, and to transmit and receiveinformation, between said first and said second user devices. The systemfurther includes one or more storage mediums having stored thereon,individually or in combination, instructions that when executed by oneor more processors result in one or more operations. The operationsinclude selecting an avatar, initiating communication, capturing animage, detecting a face in the image and determining facialcharacteristics from the face. The facial characteristics include atleast one of eye movement and eyelid movement, converting the facialcharacteristics to avatar parameters, transmitting at least one of theavatar selection and avatar parameters.

Another example system includes the foregoing components and determiningfacial characteristics from the face includes determining a facialexpression in the face.

Another example system includes the foregoing components and the avatarselection and avatar parameters are used to generate an avatar on aremote device, the avatar being based on the facial characteristics.

Another example system includes the foregoing components and the avatarselection and avatar parameters are used to generate an avatar in avirtual space, the avatar being based on the facial characteristics.

Another example system includes the foregoing components and theinstructions that when executed by one or more processors result in thefollowing additional operation of receiving at least one of a remoteavatar selection or remote avatar parameters.

Another example system includes the foregoing components and furtherincludes a display, the instructions that when executed by one or moreprocessors result in the following additional operation of displaying anavatar based on the remote avatar selection.

Another example system includes the foregoing components and theinstructions that when executed by one or more processors result in thefollowing additional operation of animating the displayed avatar basedon the remote avatar parameters.

According to one aspect, there is provided an apparatus for interactiveavatar communication between a first user device and a remote userdevice. The apparatus includes a communication module configured toinitiate and establish communication between the first and the remoteuser devices and to transmit information between the first and theremote user devices. The apparatus further includes an avatar selectionmodule configured to allow a user to select an avatar for use during thecommunication. The apparatus further includes a face detection moduleconfigured to detect a facial region in an image of the user and todetect and identify one or more facial characteristics of the face. Thefacial characteristics include eye movement and eyelid movement of theuser. The apparatus further includes an avatar control module configuredto convert the facial characteristics to avatar parameters. Thecommunication module is configured to transmit at least one of theavatar selection and avatar parameters.

Another example apparatus includes the foregoing components and furtherincludes an eye detection/tracking module configured to detect andidentify at least one of eye movement of the user with respect to adisplay and eyelid movement of the user,

Another example apparatus includes the foregoing components and the eyedetection/tracking module includes an eye classification moduleconfigured to determine at least one of gaze direction of the user'seyes user and blinking of the user's eyes.

Another example apparatus includes the foregoing components and theavatar selection and avatar parameters are used to generate an avatar onthe remote device, the avatar being based on the facial characteristics.

Another example apparatus includes the foregoing components and thecommunication module is configured to receive at least one of a remoteavatar selection and remote avatar parameters.

Another example apparatus includes the foregoing components and furtherincludes a display configured to display an avatar based on the remoteavatar selection.

Another example apparatus includes the foregoing components and theavatar control module is configured to animate the displayed avatarbased on the remote avatar parameters.

According to another aspect there is provided a method for interactiveavatar communication. The method includes selecting an avatar,initiating communication, capturing an image, detecting a face in theimage and determining facial characteristics from the face, The facialcharacteristics include at least one of eye movement and eyelidmovement, converting the facial characteristics to avatar parameters,transmitting at least one of the avatar selection and avatar parameters.

Another example method includes the foregoing operations and determiningfacial characteristics from the face includes determining a facialexpression in the face.

Another example method includes the foregoing operations and the avatarselection and avatar parameters are used to generate an avatar on aremote device, the avatar being based on the facial characteristics.

Another example method includes the foregoing operations and the avatarselection and avatar parameters are used to generate an avatar in avirtual space, the avatar being based on the facial characteristics.

Another example method includes the foregoing operations and furtherincludes receiving at least one of a remote avatar selection or remoteavatar parameters.

Another example method includes the foregoing operations and furtherincludes displaying an avatar based on the remote avatar selection on adisplay.

Another example method includes the foregoing operations and furtherincludes animating the displayed avatar based on the remote avatarparameters.

According to another aspect there is provided at least one computeraccessible medium including instructions stored thereon. When executedby one or more processors, the instructions may cause a computer systemto perform operations for interactive avatar communication. Theoperations include selecting an avatar, initiating communication,capturing an image, detecting a face in the image and determining facialcharacteristics from the face. The facial characteristics include atleast one of eye movement and eyelid movement, converting the facialcharacteristics to avatar parameters, transmitting at least one of theavatar selection and avatar parameters.

Another example computer accessible medium includes the foregoingoperations and determining facial characteristics from the face includesdetermining a facial expression in the face.

Another example computer accessible medium includes the foregoingoperations and the avatar selection and avatar parameters are used togenerate an avatar on a remote device, the avatar being based on thefacial characteristics.

Another example computer accessible medium includes the foregoingoperations and the avatar selection and avatar parameters are used togenerate an avatar in a virtual space, the avatar being based on thefacial characteristics.

Another example computer accessible medium includes the foregoingoperations and further includes receiving at least one of a remoteavatar selection or remote avatar parameters.

Another example computer accessible medium includes the foregoingoperations and further includes displaying an avatar based on the remoteavatar selection on a display.

Another example computer accessible medium includes the foregoingoperations and further includes animating the displayed avatar based onthe remote avatar parameters.

The terms and expressions which have been employed herein are used asterms of description and not of limitation, and there is no intention,in the use of such terms and expressions, of excluding any equivalentsof the features shown and described (or portions thereof), and it isrecognized that various modifications are possible within the scope ofthe claims. Accordingly, the claims are intended to cover all suchequivalents.

What is claimed is:
 1. At least one device to communicate using avatars,comprising: avatar selection circuitry to receive from a user aselection of an avatar from a plurality of predefined avatars, theselected avatar to be animated on a remote device; communicationcircuitry to transmit the avatar selection to the remote device; cameraand audio framework circuitry to capture at least one image; and facedetection circuitry to track the user's face in the at least one imageand identify one or more facial characteristics in the face, the facedetection circuitry including at least: eye detection and trackingcircuitry to determine current eye status based on the at least oneimage, determine changes in eye status based on the current eye statusand formulate at least one instruction to cause the avatar to mimic anychanges in eye status that were determined, the at least one instructionbeing transmitted to the remote device via the communication circuitryfor use in animating of the avatar.
 2. The at least one device of claim1, wherein the camera and audio framework circuitry is coupled to atleast one camera to capture the at least one image, at least onemicrophone to capture local sounds and at least one speaker to reproducethe local sounds to provide audio feedback to the user and to reproduceremote sounds captured by the remote device.
 3. The at least one deviceof claim 1, wherein the face detection circuitry is further to recognizean expression by classifying the one or more facial characteristics, theat least one instruction at least indicating the expression to theremote device for use in animating the avatar.
 4. The at least onedevice of claim 3, wherein the at least one instruction is to cause theavatar to display an exaggerated version of the expression to emphasizethe expression.
 5. The at least one device of claim 1, furthercomprising a display to display at least a feedback avatar replicatingthe animation of the avatar on the remote device.
 6. The at least onedevice of claim 5, wherein the at least one instruction is to cause theavatar to be animated on the remote device in a virtual space and thedisplay is further to display a perspective view of what the avatar seesin the virtual space.
 7. The at least one device of claim 1, wherein theat least one instruction is specific to the selected avatar.
 8. The atleast one device of claim 7, wherein upon selection of a new avatar theavatar selection circuitry is to cause the communication circuitry totransmit the new avatar selection to the remote device and alter the atleast one instruction based on the new avatar.
 9. The at least onedevice of claim 1, wherein the face detection circuitry furthercomprises face normalization circuitry to normalize a face detected inthe at least one image.
 10. The at least one device of claim 9, whereinnormalization comprises at least rotating the at least one image toalign eyes within a face captured in the image, cropping the at leastone image corresponding to a size of the face, scaling the at least oneimage to make a distance between the eyes constant, applying a mask tozero out pixels not within an oval that would contain a typical face,histogram equalize the at least one image to smooth the distribution ofgray values for non-masked pixels, and normalizing the at least oneimage so the non-masked pixels have zero mean and a standard deviationof one.
 11. The at least one device of claim 9, wherein the eyedetection and tracking circuitry comprises eye classification circuitryto determine the current eye status including at least one of eyeopening, eye closing or eye movement based on at least one normalizedimage received from the face normalization circuitry.
 12. A method forcommunicating using avatars, comprising: receiving from a user, viaavatar selection circuitry, a selection of an avatar from a plurality ofpredefined avatars, the selected avatar to be animated on a remotedevice; transmitting, using communication circuitry, the avatarselection to the remote device; capturing, using camera and audioframework circuitry, at least one image; tracking, using face detectioncircuitry, the user's face in the at least one image and identifying oneor more facial characteristics in the face; determining, using eyedetection and tracking circuitry in the face detection circuitry,current eye status based on the at least one image, determining changesin eye status based on the current eye status and formulating at leastone instruction to cause the avatar to mimic any changes in eye statusthat were determined, the at least one instruction being transmitted tothe remote device via the communication circuitry for use in animatingof the avatar.
 13. The method of claim 12, further comprising:capturing, using at least one camera coupled to the camera and audioframework circuitry, the at least one image; capturing, using at leastone microphone coupled to the camera and audio framework circuitry,local sounds; reproducing, using at least one speaker coupled to thecamera and audio framework circuitry, the local sounds to provide audiofeedback to the user and remote sounds captured by the remote device.14. The method of claim 12, further comprising: recognizing, using theface detection circuitry, an expression by classifying the one or morefacial characteristics, the at least one instruction at least indicatingthe expression to the remote device for use in animating the avatar. 15.The method of claim 12, further comprising: displaying, using a display,at least a feedback avatar replicating the animation of the avatar onthe remote device.
 16. The method of claim 12, further comprising:normalizing, using face normalization circuitry in the face detectioncircuitry, a face detected in the at least one image.
 17. The method ofclaim 16, wherein normalizing comprises at least rotating the at leastone image to align eyes within a face captured in the image, croppingthe at least one image corresponding to a size of the face, scaling theat least one image to make a distance between the eyes constant,applying a mask to zero out pixels not within an oval that would containa typical face, histogram equalize the at least one image to smooth thedistribution of gray values for non-masked pixels, and normalizing theat least one image so the non-masked pixels have zero mean and astandard deviation of one.
 18. The method claim 16, further comprising:determining, using eye classification circuitry in the eye detection andtracking circuitry, the current eye status including at least one of eyeopening, eye closing or eye movement based on at least one normalizedimage received from the face normalization circuitry.
 19. At least onemachine-readable storage medium having stored thereon, individually orin combination, instructions for communicating using avatars that, whenexecuted by one or more processors, cause the one or more processors to:receive from a user, via avatar selection circuitry, a selection of anavatar from a plurality of predefined avatars, the selected avatar to beanimated on a remote device; transmit, using communication circuitry,the avatar selection to the remote device; capture, using camera andaudio framework circuitry, at least one image; track, using facedetection circuitry, the user's face in the at least one image andidentifying one or more facial characteristics in the face; determine,using eye detection and tracking circuitry in the face detectioncircuitry, current eye status based on the at least one image, determinechanges in eye status based on the current eye status and formulate atleast one instruction to cause the avatar to mimic any changes in eyestatus that were determined, the at least one instruction beingtransmitted to the remote device via the communication circuitry for usein animating of the avatar.
 20. The storage medium of claim 19, furthercomprising instructions for communicating using avatars that, whenexecuted by one or more processors, cause the one or more processors to:capture, using at least one camera coupled to the camera and audioframework circuitry, the at least one image; capture, using at least onemicrophone coupled to the camera and audio framework circuitry, localsounds; reproduce, using at least one speaker coupled to the camera andaudio framework circuitry, the local sounds to provide audio feedback tothe user and remote sounds captured by the remote device.
 21. Thestorage medium of claim 19, further comprising instructions forcommunicating using avatars that, when executed by one or moreprocessors, cause the one or more processors to: recognize, using theface detection circuitry, an expression by classifying the one or morefacial characteristics, the at least one instruction at least indicatingthe expression to the remote device for use in animating the avatar. 22.The storage medium of claim 19, further comprising instructions forcommunicating using avatars that, when executed by one or moreprocessors, cause the one or more processors to: display, using adisplay, at least a feedback avatar replicating the animation of theavatar on the remote device.
 23. The storage medium of claim 19, furthercomprising instructions for communicating using avatars that, whenexecuted by one or more processors, cause the one or more processors to:normalize, using face normalization circuitry in the face detectioncircuitry, a face detected in the at least one image.
 24. The storagemedium of claim 23, wherein the instructions to normalize compriseinstructions to at least rotate the at least one image to align eyeswithin a face captured in the image, crop the at least one imagecorresponding to a size of the face, scale the at least one image tomake a distance between the eyes constant, apply a mask to zero outpixels not within an oval that would contain a typical face, histogramequalize the at least one image to smooth the distribution of grayvalues for non-masked pixels, and normalize the at least one image sothe non-masked pixels have zero mean and a standard deviation of one.25. The storage medium of claim 23, further comprising instructions forcommunicating using avatars that, when executed by one or moreprocessors, cause the one or more processors to: determine, using eyeclassification circuitry in the eye detection and tracking circuitry,the current eye status including at least one of eye opening, eyeclosing or eye movement based on at least one normalized image receivedfrom the face normalization circuitry.